Accurate Sequence-Dependent Coarse-Grained Model for Conformational and Elastic Properties of Double-Stranded DNA

We introduce MADna, a sequence-dependent coarse-grained model of double-stranded DNA (dsDNA), where each nucleotide is described by three beads localized at the sugar, at the base moiety, and at the phosphate group, respectively. The sequence dependence is included by considering a step-dependent parametrization of the bonded interactions, which are tuned in order to reproduce the values of key observables obtained from exhaustive atomistic simulations from the literature. The predictions of the model are benchmarked against an independent set of all-atom simulations, showing that it captures with high fidelity the sequence dependence of conformational and elastic features beyond the single step considered in its formulation. A remarkably good agreement with experiments is found for both sequence-averaged and sequence-dependent conformational and elastic features, including the stretching and torsion moduli, the twist–stretch and twist–bend couplings, the persistence length, and the helical pitch. Overall, for the inspected quantities, the model has a precision comparable to atomistic simulations, hence providing a reliable coarse-grained description for the rationalization of single-molecule experiments and the study of cellular processes involving dsDNA. Owing to the simplicity of its formulation, MADna can be straightforwardly included in common simulation engines. Particularly, an implementation of the model in LAMMPS is made available on an online repository to ease its usage within the DNA research community.


■ INTRODUCTION
Sequence-dependent conformational and elastic properties of DNA are of the utmost importance for its regulation in vivo, as they directly affect DNA−protein interactions. 1 The detailed shape of a DNA fragment and its deformability are indeed key determinants of protein recognition and binding. 1,2 For instance, the unique conformational properties of A-tracts are known to affect nucleosomal organization, 3 as well as DNA replication and recombination. 4 Analogously, TATA-box elements present a strong deviation from the canonical B-DNA conformation, which is exploited to enhance binding by the TATA-box binding protein. 1 Moreover, proteins continuously exert mechanical stress on bound DNA. For example, torsion is applied on DNA by topoisomerases and polymerases 5,6 and plays a central role in chromatin remodeling. 7 Also, DNA stretching is relevant in vivo, e.g., in the action of recombinases 8 or for site recognition in nucleosomes. 9 Together with the practical implications of DNA elasticity in nanotechnological applications, 10,11 this fundamental interest has prompted a conspicuous amount of experimental efforts devoted to the detailed characterization of DNA conformational ensemble and elasticity. For the study of conformers, classical crystallographic and NMR studies are nowadays being complemented by X-ray interferometry 12,13 and by single-molecule imaging with atomic force microscopy (AFM) 14,15 or cryoelectron microscopy. 16 Single-molecule techniques like AFM, optical tweezers, and magnetic tweezers are employed to assess the elastic properties of long DNA molecules, providing stiffness values for various mechanical perturbation modes such as stretching, 17−21 twisting, 20,22−27 bending, 17−19,21,28−30 and the coupling between them. 20,31−35 At short length scales, the interpretation of experiments becomes cumbersome due to both theoretical and experimental challenges. Indeed, direct AFM imaging has provided conflicting results on the bendability of short DNA fragments. 36,37 For other experimental approaches such as cyclization assays, 38−41 the elastic parameters can be obtained only by interpreting the data within specific theoretical frameworks, and conflicting conclusions have been reported based on different physical assumptions. 12,42−44 Molecular simulations provide a valuable means to complement experimental studies, particularly in view of the discrepancies mentioned above. All-atom molecular dynamics has been successfully employed to capture DNA elastic and conformational features 45−49 as well as their dependence on sequence. 35,50−53 An evident limit of atomistic simulations originates from the associated high computational cost, which puts severe boundaries on the length and time scales that can be simulated. The need to overcome this barrier has fostered the development of coarse-grained approaches, where one selects only the degrees of freedom relevant to the problem at hand. An elegant solution is implemented in the software packages cgDNA 54 and MC-eNN, 55 where the nucleotides are represented as rigid frames and the energy of the system is described by means of a stiffness matrix. This approach is extremely efficient from a computational perspective and enables assessing the conformational ensemble of long DNA chains. 56 However, an important drawback lies in the absence of an explicit description of the backbone and the related difficulty in interfacing this representation with external perturbations present in most systems of interest, e.g., mechanical stress, confinement, or a binding protein. A good compromise between computational efficiency and modeling flexibility is achieved by considering coarse-grained models in which effective particles represent the various moieties along the DNA molecule. 57−65 These models have been employed to describe a wide palette of systems involving DNA, including, e.g., DNA origami, 66 nucleosomes, 67 and cyclization assays. 44 Despite this success, the coarse-grained models of the latter kind present in the literature do not satisfactorily capture all the main elastic features of double-stranded DNA at once (see, e.g., refs 62 and 68). In this work, we fill this gap by introducing the mechanically-accurate DNA (MADna) model, a sequence-dependent coarse-grained model whose parameters are tuned to reproduce local conformational features of double-stranded DNA and the stiffness of the various mechanical perturbation modes obtained via all-atom molecular dynamics. 52 We show that the model satisfactorily reproduces the atomistic values also for DNA fragments different than the ones employed for fitting, as well as experimental data from the literature. Particularly, and at variance with existing models, our coarse-grained description captures the negative twist−stretch coupling recently unveiled by single-molecule force spectroscopy 31,32 as well as the experimentally determined sequence dependence of dsDNA helical pitch and persistence length. 28 The full details of MADna and the definitions of the various physical and geometrical quantities are provided in Methods. Nevertheless, they are shortly introduced also in Results and Discussion, making this section somewhat self-standing. In this way, the main findings of this work are presented succinctly, leaving the interested reader to look up the technical details within the Methods section.
Finally, we remark that an online repository is available (https://github.com/saassenza/MADna/tree/main/ LAMMPS) where MADna has been implemented in LAMMPS 69 without the need to introduce additional custom packages. The repository also contains a molecular builder which automatically creates the initial coordinates and topology for simulation in LAMMPS.

■ METHODS
Coarse-Grained Model. MADna describes a molecule of double-stranded DNA (dsDNA) by considering three effective particles per nucleotide, located in the geometric centers of the phosphate group, the sugar, and the base, as proposed in the past in the 3SPN 57 and TIS 65 models. In Figure 1, two cartoons are reported to compare the atomistic and coarsegrained description of a representative sequence. Further details on the coarse-graining procedure are reported in section S1 in the Supporting Information. In spite of the similar level of coarse-graining, MADna diverges from the other models in terms of the choice of bonded interactions both in the way they are built and in their parametrization. Particularly, our choice of bonded interactions enables capturing the structural heterogeneity predicted by atomistic simulations, as will be showed below when we comment on the results in Figure 2 and Figure 5i. On the other hand, we chose to fix the doublestranded structure by means of bonded interactions, which simplifies the implementation of the model (for instance, MADna does not need modulation factors to account for anisotropic interactions) but makes it less general than the other ones as it cannot account for DNA thermodynamics. In this regard, MADna is more akin to the MARTINI representation of double-stranded DNA, where the double helix is maintained through a suitably defined elastic network. 64 We will add the possibility of strand separation in future versions of MADna. As for the parametrization, MADna extracts information from a large data set provided by atomistic simulations, while both TIS and 3SPN are parametrized by means of experiments. Experimental results are of course the ultimate source of information, but we decided to tune the parameters of MADna from atomistic simulations based on Figure 1. Atomistic (left) and coarse-grained (right) description for a representative dsDNA molecule with leading sequence 5′-CGCTACTTCGAGG-3′ in the B-DNA form as obtained by employing the NAB software. 70 In the coarse-grained cartoon, the color code is the following: sugar ↔ black; phosphate group ↔ red; adenine ↔ green; cytosine ↔ cyan; guanine ↔ yellow; thymine ↔ orange. The size of each bead is proportional to the WCA radius of the corresponding moiety.
both the far richer amount of available information and their demonstrated accuracy in capturing local conformational and elastic features of dsDNA. As we show in the Results and Discussion section, this choice is rewarded with a very good performance of MADna in capturing many distinct experimental observations on the structure and mechanical response of dsDNA.
Bonded Interactions. The double-stranded topology is fixed once and for all and is maintained by introducing several two-, three-, and four-body bonded interactions (from here on, we will refer to them as bonds, angles, and dihedrals, respectively), which connect beads within the same strand as well as providing interstrand links.
Before going into the details of the bonded interactions considered, we first proceed to clarify the nomenclature used in this section. In order to label the bonded interactions, we indicate the sugar, phosphate, and base beads as S, P, and B, respectively. In this way, a bond between, e.g., a sugar and a base is indicated as SB. Whenever the bonded interaction runs along the 5′-3′ direction of a strand, we add a corresponding tag at the two ends of the label. For instance, a bond between a sugar and a phosphate in the 5′-3′ direction is indicated as 5′-SP-3′. Analogously, we denote as 5′-SPSB-3′ a dihedral involving a sugar, a phosphate, the following sugar, and the base attached to it (ordered according to the 5′-3′ direction of the strand). In order to keep in place the double-stranded structure, some bonded interactions involve both strands. Such bonded interactions are marked by underlining their label. For instance, the bonds accounting for Watson−Crick base pairing are indicated as BB-WC (in this specific case we also added "WC" in order to clearly distinguish these bonds from 5′-BB-3′, which account for stacking interactions). As another example, we denote as 5′-PSBB-3′ a dihedral involving in the 5′-3′ direction a phosphate, a sugar, the attached base, and the base paired to it. We further note that, due to the asymmetry of the 5′-3′ direction, the order in which the same beads are considered is important. For instance, the bonds 5′-SP-3′ and 5′-PS-3′ are different from each other. This can be clearly seen by comparing the corresponding entries in Table S2 in the Supporting Information, where we show the equilibrium values computed by coarse-graining the atomistic simulations used to parametrize MADna (see below). Analogously, the dihedrals 5′-SPSP-3′ and 5′-PSPS-3′ are different from each other, as can be seen from Table S4 in the Supporting Information. For the bonds potential U bond , a harmonic function was considered: where r is the distance separating the two beads connected by the bond, k bond is the elastic constant specific to the type of bond considered, and r 0 is the corresponding equilibrium distance. An analogous formula was used for the angle potential U angle : where θ is the angle characterizing the three-body bonded interaction, k angle is the bending constant relative to the type of angle considered, and θ 0 is the equilibrium value of θ. Finally, the following formula was chosen for the dihedral potential U dihedral : where ϕ is the dihedral angle, k dihedral is the elastic constant of the particular type of dihedral considered, and ϕ 0 = ϕ min − 180°, with ϕ min being the equilibrium dihedral angle. This choice for the bonded interactions relies on the observation of normal distributions for most of the corresponding observables (see Section S2.2 in the Supporting Information). The general guideline in choosing how to build the topology was to obtain a minimal set of bonded interactions which reproduces the structural features obtained in all-atom simulations. With this spirit, we considered only bonded interactions involving up to a single step and implemented the sequence dependence by setting different values of the parameters according to the particular step under consideration. For instance, if along the sequence we have a CT step, the bond 5′-BB-3′ accounting for the stacking interaction is implemented by means of eq 1, where k bond and r 0 are equal to the values assigned to a CT step. Analogously, the dihedral 5′-PSBB-3′ is implemented by means of eq 3, where k dihedral and ϕ 0 are set to the values corresponding to CT. The same is repeated for all the step-dependent bonded interactions considered, so that 16 different values are considered for the corresponding parameters depending on the involved step. Some local features are symmetric with respect to the 5′-3′ Vertical lines separate the distinct sequences, which are listed in Section S3.1 in the Supporting Information. In the legends, the various labels correspond to the particular bases involved in the local conformation under consideration. For instance, SAT considers a SBB angle in which an adenine is bound to the sugar. Note also that the dihedrals are symmetric with respect to the inversion of the involved bases, as it just corresponds to changing the arbitrary reference strand. direction. The corresponding bonded interactions were then implemented as being dependent only on the corresponding base pair. For instance, for the bond SB we considered four possibilities (A, C, G, and T), while for the bond BB-WC there were just two choices (AT or CG). Yet, this assumes that these local features do not depend on the neighboring steps, which is not always the case. As an example, we report in Figure 2 the averages obtained by coarse-graining the atomistic simulations for the angles SBB ( Figure 2a) and the dihedrals SBBS ( Figure  2b). The angles SBB are obtained by considering a sugar, the base attached to it, and the corresponding Watson−Crick partner. For instance, we indicate as SAT the angle formed by a sugar, the adenine attached to it, and the thymine paired with the adenine. As for the dihedrals SBBS, we similarly consider a Watson−Crick base pair and the corresponding sugars. In Figure 2, the points are color-coded according to a classification based on a single base pair. Hence, four possible choices are present for SBB, depending on the base being attached to the sugar (A, C, G, T), while only two choices are available for SBBS, corresponding to the Watson−Crick base pairs AT and CG. As shown in Figure 2a, the data for SBB nicely cluster around their average values, indicating that this angle is in practice independent of the rest of the sequence of the hosting DNA molecule and that it can thus be considered for bonded interactions depending only on the base pair. In contrast, the dihedrals SBBS show a wide variability, particularly in the case of CG base pairs (Figure 2b). This demonstrates that these dihedrals depend on their neighborhood and are thus not apt for implementing bonded interactions depending only on the base pair. As we show in the Results and Discussion, the heterogeneity of the SBBS dihedrals is quantitatively captured by MADna as an emergent property resulting from the interplay between the other bonded interactions. We also note that this is a distinguishing feature of MADna when compared to the TIS 65 and 3SPN2 71 models, where instead the base-pair-dependent SBBS interactions are introduced as part of the model, thus preventing the possibility of capturing the heterogeneity suggested by the atomistic simulations.
The choice of the bonded interactions was obtained in practice by trial and error: starting from a far too simplistic description, where the beads were barely kept together by a minimal set of bonds, we incrementally added bonded interactions. At each iteration, we ran simulations with the coarse-grained model and computed the averages obtained for local conformational features that were not yet directly implemented. We then compared these averages with the results obtained by coarse-graining the atomistic trajectories. If the comparison was not good, we then proceeded to add other bonded interactions. The iterative procedure stopped when we reached a satisfactory comparison. We also considered an opposite approach, where we started by considering far too many bonded interactions and deleted some of them until further eliminations resulted in disrupting the structure. The process has some arbitrarity in the properties being monitored, the order in which we add bonded interaction, etc. Yet, the high fidelity of the coarse-grained model in reproducing the atomistic values (as shown in the Results and Discussion) validates the final choice of bonded parameters that we considered. The final set of employed bonds, angles, and dihedrals is reported in Figure 3, together with some representative examples based on the dsDNA molecule sketched in Figure 1. Some of these bonded interactions are directly related to physically meaningful features, such as hydrogen bonds between Watson−Crick pairs (labeled as BB-WC in Figure 3) and stacking interactions (5′-BB-3′ in Figure  3). Summing up, there are in total 202 distinct bonded interactions in the model. Further details are reported in Section S2 in the Supporting Information. Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article Importantly, since all the bonded interactions involve at most two consecutive base pairs, any feature at scales larger than a single step is an emergent property originating from the propagation of the local interactions in combination with the electrostatic repulsion between phosphates (see below).
Excluded Volume. Excluded-volume interactions were implemented by means of a Weeks−Chandler−Andersen (WCA) potential, i.e., by retaining the repulsive part of a Lennard-Jones interaction: l In the previous formula, r is the distance between the two particles, ϵ = 1 kcal/mol, while σ depends on the bead considered, as reported in Table S1 in the Supporting Information.
Electrostatics. In order to account for the presence of charges along the backbone, a charge q·e 0 was assigned to the beads corresponding to phosphate groups, with e 0 = 1.6 × 10 −19 C being the elementary charge. Although each phosphate carries a unit negative charge, in order to account for counterions condensation an effective reduced value q = −0.6 was considered. 65,71,72 The salt-induced electrostatic screening was modeled via a Debye−Huckel interaction: 73 In the previous formula, ϵ 0 = 8.859 × 10 −12 F/m is the absolute permittivity; ϵ r = 78.3 is the relative permittivity of water; and l D is the Debye length, defined as where k B = 1.38 × 10 −23 J/K is Boltzmann's constant; T is the temperature of the system in Kelvin; N A = 6.022 × 10 23 mol −1 is Avogadro's constant; and I is the ionic strength of the solution in mM. Determination of Parameters. The parameters of the various sequence-dependent bonded interactions were tuned in order to reproduce the results of atomistic simulations from ref 52, which were performed in AMBER14 74 with the parm99 force field 75 including the bsc0 modifications. 76 In those simulations, dsDNA molecules with different sequences were pulled under the action of forces ranging from 1 to 20 pN. The main details of the atomistic simulations are reported in Section S2.1 in the Supporting Information. Here, we used a subset of the atomistic simulations covering all possible base steps (we refer to this subset as "training sequences", see Section S3.1 in the Supporting Information for the sequences) to determine the bonded parameters, while the rest of sequences were employed as test cases to benchmark the coarse-grained model ("testing sequences", also reported in Section S3.1 in the Supporting Information). It has to be noted that, although parm99+bsc0 simulations reproduce well most structural features determined in experiments, 77 more precise modifications have been recently introduced, namely bsc1 78 and OL15. 79 Our reason to choose bsc0 was rooted in the convenience of having the atomistic data already available in our group, 46,52 together with the reasonable precision of these modifications. 77 Future developments of MADna will include the reparameterization of the model via atomistic simulations based on bsc1 or OL15.
A first estimation of the parameters was obtained via Boltzmann inversion of the all-atom simulations performed at 1 pN. In this regard, the atomistic trajectories were first coarsegrained according to the three-beads representation. Then, for each bond, angle, and dihedral the ensemble averages were computed from the coarse-grained trajectories, which enabled fixing the values of r 0 , θ 0 , and ϕ 0 . Analogously, the elastic constants were determined in order to reproduce the size of fluctuations starting from eqs 1, 2, and 3. Further details on this procedure can be found in Section S2.2 in the Supporting Information.
Coarse-grained simulations performed using the obtained force field showed that the set of parameters obtained by Boltzmann inversion provides reasonable values for the elastic constants (see Figure S1 in the Supporting Information). Nevertheless, some of the parameters were further tuned in order to improve the quantitative agreement with the atomistic simulations, focusing on reproducing the elastic constants of the training sequences (see Results and Discussion and Section S2.2 in the Supporting Information).
Molecular Dynamics Simulations. All the simulations were performed in LAMMPS (http://lammps.sandia.gov 69 ). The temperature T = 300 K was maintained through a Langevin thermostat with damping constant τ damp = 20 ps. The integration step was set to 20 fs.
Benchmark Simulations. In order to benchmark the coarsegrained model, we performed pulling simulations of the training and testing sequences following the same protocol as in refs 46 and 52. A pulling force f was applied to the center of mass ξ 2 of the sugars belonging to the second base pair. Analogously, an opposite force −f was applied to the center of mass ξ l seq −1 of the second-to-last base pair (l seq is the length of the sequence) . The direction of the forces was taken along the line connecting ξ 2 and ξ l seq −1 . Simulations were performed for f = 1, 5, 10, 15, and 20 pN. For each simulation, the system was initialized by considering the average structure created by the molecular builder (defined below), and the equations of motion were integrated for 20 ns. In order to ensure full equilibration, only the last 10 ns were considered for analysis. The convergence of a representative simulation is reported in Figure S8 in the Supporting Information. For each sequence and force, 100 independent simulations were performed. The ionic strength I was set at the same value as in the atomistic simulations from refs 46 and 52 and was computed as I = N ions /N A V, where N ions is the number of counterions and V is the volume of the simulation box.
Persistence-Length Simulations. For the computation of the sequence-averaged persistence length, we considered 20 random sequences made of 100 base pairs. The sequences are listed in Section S3.2 in the Supporting Information. For each realization, the molecule was initialized by considering the average structure obtained by the molecular builder (see below), and the simulation was run for 500 ns. In order to ensure full equilibration, the first 100 ns were discarded from the analysis. The convergence of a representative trajectory is reported in Figure S9 in the Supporting Information. For each sequence, 10 independent simulations were performed. The ionic strength was set at I = 150 mM.
Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article In a second set of simulations, we studied the sequence dependence of persistence length and helical pitch and compared our results with the experiments from ref 28. We considered 14 sequences of 100 base pairs obtained as the central fragments of the corresponding experimental ones. 28 The sequences are listed in Section S3.2 in the Supporting Information. The same protocol as for the sequence-averaged study was applied. The ionic strength was set at I = 1000 mM to ease the comparison with the predictions of CGDNA. 56 The convergence of a representative trajectory is reported in Figure  S10 in the Supporting Information. Simulations were performed with MADna and, for comparison, with oxDNA2 80 (using the LAMMPS implementation 81 and considering sequence-dependent stacking interactions) and the sequence-dependent model 3SPN2C. 82 Stretch−Torsion Simulations. In order to determine the elastic constants, a simulation setup was considered where a 40bp dsDNA molecule was subjected to the simultaneous action of a pulling force f and a torque τ directed along the z axis and with constant magnitude.
For each simulation, the molecule was initialized via the molecular builder and was aligned with the z axis. The two bottom base pairs and the corresponding sugars and phosphates were tethered to their initial position via harmonic constraints with constant 100 kcal/mol Å 2 . A constant pulling force f oriented along the positive z direction was exerted on the center of the sugars belonging to the second base pair counting from the top. A harmonic constraint with constant 100 kcal/mol Å 2 was also applied to the x and y coordinates of the center to keep the molecule aligned with the z axis and avoid spurious effects originating from the overall orientational entropy. 83 Finally, a constant torque τ was imposed on the two top base pairs and the corresponding sugars and phosphates.
Simulations were performed at τ = 0 for values of the force f To account for sequence-induced heterogeneity, five random sequences were generated, which we refer to as sequences ST1, ..., ST5 (the sequences are listed in Section S3.3 in the Supporting Information). Moreover, to check the dependence of the elastic constants on the length of the dsDNA molecule, we also performed simulations on shorter sequences containing 20 base pairs, which we refer to as sequences ST1-short, ..., ST5-short (the sequences are listed in Section S3.3 in the Supporting Information). For each combination of sequence, force, and torque, 10 independent simulations of length 250 ns were run. The first 50 ns were not considered for analysis in order to let the system equilibrate. The convergence of a representative trajectory is reported in Figure S11 in the Supporting Information. Simulations were performed with MADna, oxDNA2, and 3SPN2C. In all cases, the ionic strength was set at 150 mM. The four base pairs at each end were discarded from the analysis.
Finally, in order to study the effect of sequence on the elasticity, we considered another set of simulations with phased A-tracts, which are listed in Section S3.3 in the Supporting Information. The same protocol as for the rest of the simulations was applied.
Twist−Bend Coupling Simulations. In order to determine the softer response to torque due to the coupling between twist and bending, we performed simulations similar to those of the previous section. Here, 150bp long sequences were placed under the action of a pulling force directed along the z axis and acting on one end of the molecule. The other end was tethered with the same protocol as in the previous section. Note that the pulled end is here free to move in the x and y directions in order to let the molecule bend.
Simulations were performed for values of the force f in steps of 0.05 pN within the range 0.05 pN ≤ f ≤ 0.50 pN and in steps of 0.25 pN within the range 0.50 pN ≤ f ≤ 2.50 pN. Three different sequences were considered, which we refer to as TB1, TB2, and TB3 (the sequences are listed in Section S3.4 in the Supporting Information). For each combination of sequence and force, three independent simulations of length 2000 ns were run. The first 1000 ns were not considered for analysis in order to let the system equilibrate. The convergence of a representative trajectory is reported in Figure S12 in the Supporting Information. Simulations were performed with MADna, 3SPN2C, and oxDNA2.
Definitions and Protocols. Basic Definitions. We will indicate as P s,i , S s,i , and B s,i the position vectors of the beads corresponding to the ith phosphate group, sugar, and base, respectively, and located on strand s = 1, 2. The index i is counted in the 5′-3′ direction of the arbitrarily chosen strand 1. Note that i = 1, ..., n for sugar and base beads, while i = 2, ..., n for the phosphate groups, where n is the total number of base pairs (compare Figure 1). Hence, there are in total N = 2[(n − 1) + n + n] = 6n − 2 beads, where the factor of 2 is included to account for the presence of the two strands. Handles were always discarded from the analysis, so the various indexes introduced here refer to the subfragment under consideration.
Crookedness. The crookedness β accounts for the deviation of a dsDNA molecule from a straight line 52 (Figure 4b). For short fragments (such as the training and testing sequences considered here), thermally induced bending is negligible; hence, in this case β effectively quantifies the spontaneous bending of the molecule. The definition of β is here modified with respect to its original formulation to adapt it to the coarse-grained context. The contour length L of the line connecting the base-pair centers along the molecule is L = If the line connecting the centers is perfectly straight, then β = 0. The more curved the line, the larger the corresponding β. As a reference, we computed the crookedness for the conformations obtained via NAB 70 for the training and testing sequences, obtaining an average value β ≃ 0.17.
Helical Parameters. Due to the limited information on local coordinates, the definition of the helical parameters has to be adapted to the coarse-grained framework. Particularly, the representation of the bases as single beads prevents defining the local reference frames, which are crucial for the standard definitions of helical parameters. 84,85 Taking advantage of the short length of the training and testing sequences, the helical axis ĥin this case was defined globally as the axis of the cylinder best fitting the position of phosphates in the double helix (Figure 4a; see Section S4.1 in the Supporting Information for the technical details), which was also employed to estimate the diameter of the DNA. In order to define the h-twist, we first introduce the sugar separation vector ĥis the projection of ζ i onto the plane perpendicular to the helical axis. Following the usual convention, 84 the sign of the h-twist was determined according to the sign of the product (ζ i r × ζ i+1 r )·h. The employment of the positions of the sugars in the definition of the h-twist was motivated by the high correlation found with the standard definition in 3DNA 84 (see Figure S2 in the Supporting Information). Finally, the h-rise for the step i, i + 1 was computed as h-rise i,i+1 = (ξ i+1 −ξ i )·h, where ξ i ≡ (S 1,i + S 2,i )/2 is the center of the sugars of base pair i (Figure 4e).
Grooves. For a visual description of the definition of groove geometry, see Figure 4c. The quantification of the geometry of grooves was assessed in a similar way as in Curves+. 85 The positions of the phosphate groups in each strand were interpolated via centripetal Catmull-Rom splines. 86 Particularly, the helical fragment connecting P s,i and P s,i+1 was interpolated by considering P s,i−1 , P s,i , P s,i+1 , and P s,i+2 as control points. For each of the terminal phosphates, the missing external control point was built by extrapolation of the h-rise and h-twist of the last step. In order to determine the groove geometry corresponding to base pair i, we first considered the midpoint M 1,i between P 1,i and P 1,i+1 along the interpolated curve. Then, starting from the analogous midpoint M 2,i on the second strand, we followed the corresponding interpolating curve in both directions while computing the distance from M 1,i until a local minimum in the distance was first encountered at two points denoted as F +,i and F −,i . From the two identified minimal distances, we subtracted 0.58 nm to account for the van der Waals radius of the backbone 85 and assigned the resulting values to the width of the major and minor groove. In order to compute the depth of each groove, we considered the midpoints on the width vectors, i.e., 0.5(M 1,i + F +,i ) and 0.5(M 1,i + F −,i ) and computed their distance from the base-pair center Γ i . Again, in order to account for the van der Waals radii, we estimated the final groove depths by subtracting 0.35 nm from the calculated distance. 85 Determination of Elastic Constants. In order to interpret single-molecule experiments, dsDNA is often modeled as an elastic rod. When a pulling force f and a The crookedness is obtained by computing the arccosine of the ratio between the end-to-end distance (black arrow) and the contour-length of DNA (red line). These quantities are obtained along the line formed by the points Γ i , which are determined as the centers between bases belonging to the same pair (inset). (c) Grooves are defined by considering the lines interpolating the phosphate beads (brown and black lines). For any couple of phosphates on the first strand (P 1,i and P 1,i+1 in this case), we define the midpoint (M i ). From the midpoint, we find the closest points on the second strand. The groove widths are obtained as the corresponding minimum distances (orange segments), suitably shifted to account for the excluded volume of the backbone. (d) The h-twist is defined by considering the vectors ζ i ≡ S 2,i − S 1,i joining the two sugars within each base. The vectors ζ i are projected onto the plane perpendicular to the helical axis, thus obtaining ζ i r . The h-twist is then defined as the angle depicted in red, corresponding to cos h-twist = ζ i r ·ζ i+1 r ·e) The h-rise is defined by considering the geometrical centers of the sugars ξ i ≡ (S 1,i + S 2,i )/2 and projecting the vector separating two consecutive centers onto the helical axis, thus obtaining h-rise = (ξ i+1 − ξ i )·h, corresponding to the red segment in the figure. torque τ are exerted along the direction of the rod, the elastic energy U reads 22,31 θ θ τ θ where S is the stretching modulus; C is the twist modulus; L is the extension; θ is the twist in radians; ΔL ≡ L − L 0 ; and Δθ ≡ θ − θ 0 , with L 0 and θ 0 being the equilibrium extension and twist at zero force and torque. On the right-hand side of eq 8, the first and second terms are stretching and twisting energy, respectively; the third term is the stretch−twist coupling energy; the fourth and fifth terms are the work performed by the external force and torque, respectively. Simultaneous minimization with respect to extension and force (∂U/∂ΔL = 0 and ∂U/∂Δθ = 0) leads to the following equations for the elastic equilibrium: where S̃≡ S − g 2 /C is the effective stretching modulus. From the previous formulas, the sign of g can be immediately ascertained by looking at the behavior of ΔL as a function of τ at constant force, with positive and negative correlation implying g < 0 and g > 0, respectively (an analogous response is obtained at constant τ for Δθ as a function of f). For a quantitative assessment of the elastic constants, three equations are needed. Different strategies were pursued for the two simulation protocols considered in this work. Benchmark Simulations. In this case, the torsion angle is estimated as θ = ∑ i = 1 n−1 h-twist i,i+1 . The extension is computed as L = |Γ n − Γ 1 |. Moreover, no torque is applied, τ = 0. From eq 9, one has ΔL = A 1 f and Δθ = A 2 f, with A 1 ≡ L 0 /S̃and A 2 ≡ −gL 0 /CS. The third equation is provided by applying the equipartition theorem to eq 8 and neglecting the twist−stretch coupling term. 31 Fitting the linear dependence of L and θ on the pulling force f, we computed the constants L 0 , θ 0 , A 1 , and A 2 . var(Δθ) was instead calculated directly from the simulations performed at f = 1 pN. Plugging the values of these quantities into eq 10, we finally obtained the values of S, C, and g reported in Figure 6. As for the crookedness constant k β , following ref 52 we define it via the relation cos β = cos β 0 (1 + f/k β ), where β 0 is the crookedness at zero force. k β is then suitably extracted from the slope of cos β versus f.
Stretch−Torsion Simulations. Taking advantage of the directionality imposed by the external force, the torsion angle in this case is computed as θ = ∑ i = 1 n−1 ψ i,i+1 , where the angle ψ is calculated as the h-twist but considering the rejection of ζ i from the direction of the force instead of the helical axis. In the same way as done above, for τ = 0 one has ΔL = A 1 f, with Fitting the linear dependence of ΔL on the pulling force f at τ = 0, we computed the constants L 0 and A 1 . Analogously, fitting of ΔL and Δθ versus τ at f = 2 pN enabled the computation of A 3 and A 4 . Plugging the values of these quantities into eq 11, we finally obtained the values of S, C, and g reported in Figure  7.
Computation of Persistence Length. In order to compute the persistence length, for each base pair i we considered the geometrical center ξ i = (S 1,i + S 2,i )/2 of the sugar beads. By introducing the displacement vector R ij ≡ ξ j − ξ i , the ith tangent vector is defined as t̂i ≡ R i−5, i+5 /|R i−5,i+5 |. The contour length l separating the points of application of two consecutive tangent vectors is computed as the average modulus of the displacement vector. The overall correlation function c p is defined as where s is the contour length, while ⟨···⟩ th,i denotes the average over both the conformational ensemble and the value chosen for i. Therefore, c p computes the decay of the orientation of the tangent vectors by taking thermal fluctuations into account. Analogously, the static correlation function c s is defined as where t̂i min is the ith tangent vector computed on the minimum-energy structure and ⟨···⟩ i denotes the average over the value chosen for i. The static correlation function quantifies the decay of the orientation of tangent vectors along the conformation corresponding to the ground state. For each sequence, the minimum-energy conformation was estimated by simulated annealing: starting from the average structure obtained via the molecular builder, we ran 100 short simulations at 300 K, after which the temperature was quenched at 0 K, while maintaining the Debye length corresponding to 300 K (this ensured that the minimumenergy configuration corresponded to the electrostatics at the temperature of interest). The ground state was then estimated  (14) This is in fact equivalent to impose, for each sequence, that the harmonic relation 1/l p = 1/l s + 1/l d proposed by Trifonov and co-workers holds exactly. 56,87 Errors. Indeterminacies on the computed quantities were estimated according to the procedures below. Note that error bars are not present in the figures whenever they are smaller than the size of symbols.
Observables. For all coarse-grained simulations, several independent realizations were performed for each case, whose number depends on the particular set of simulations considered (see above). For each computed observable, the indeterminacy on the average was obtained as the standard error of the mean computed across the independent realizations. As for var(Δθ), the standard error of the variance was considered. In the case of the atomistic trajectories, the time series of each observable was decorrelated by performing a block analysis, from which the error was then estimated. 88 Elastic Constants. In order to account for the errors on the elastic constants, the following procedure was employed. In all cases, the constants were obtained starting from a linear relation y = ax + y 0 , where x is either the force or the torque, while y is a certain observable (e.g., the extension). Let = x y i , , 1, ..., i i be the average values of the variables, and let dy i be the error associated with y i (x i does not have an associated error). Assuming the values of y to be independent and distributed normally, we iteratively considered sets of variables y ̅ i,α extracted from Gaussians with average y i and standard deviation dy i , where α = 1, ..., indicates the particular realization considered. For each such realization, we obtained the fitting parameters a α and y 0,α by linear regression. The mean slope was finally obtained as the average ∑ α α = a / 1 , while the associated error was estimated as the corresponding standard deviation. We checked numerically that a satisfying convergence was obtained for = 5000. Molecular Builder. The results obtained for the training sequences were employed to devise a molecular builder, which provides sequence-dependent average structures to be used as initial configurations. For each step XY, with both X and Y chosen among the four bases A, C, G, and T, we considered the simulations run for the training sequence which contains it. For instance, the sequence PolyAA was considered for the steps AA and TT. The corresponding trajectories were aligned in order to minimize the root-mean-square deviation (rmsd) of the central step XY, computed according to the Kabsch algorithm. 89 After this operation, the average coordinates of the 10 beads belonging to the step (four sugars, four bases, and two phosphates) were computed. These coordinates can be employed to build the average structure of a molecule, where the junctions between adjacent steps are aligned in order to minimize the rmsd of the overlapping base pair. As an example, let us consider the molecule with sequence 5′-ACT-3′. Two steps are present: AC and CT. The molecule is initiated by considering the average coordinates of the step AC. At this point, we note that the four beads (two sugars and two bases) belonging to the second base pair of the step AC are the same as the four beads belonging to the first base pair of the next step, CT. The average coordinates of the latter are then translated and rotated in order to minimize the rmsd between the two sets of coordinates for the overlapping beads. The final coordinates for the four overlapping beads are computed as the mean of the two aligned sets. This procedure can be iterated for molecules of arbitrary length. As a last step, the final structure is translated so that the origin corresponds to the first sugar bead and is subsequently aligned to the z axis. The molecular builder is provided at https://github.com/ saassenza/MADna/tree/main/LAMMPS and, using as input only the sequence, creates both the initial coordinates and the double-stranded topology in LAMMPS format.

■ RESULTS AND DISCUSSION
Summary of the Coarse-Grained Model. In line with previous work, 57,65 the coarse grain considered in MADna describes each nucleotide by means of three beads, each centered on the sugar, phosphate group, and base, respectively ( Figure 1). The beads interact with each other via steric interactions and, in the case of phosphate groups, electrostatic repulsion. The double-stranded molecular structure is maintained by introducing bonded interactions (eqs 1, 2, and 3 and Figure 3), with parameters depending on the sequence up to the level of a single step. The parameters were determined by Boltzmann inversion of atomistic trajectories from the literature, 52 whose sequences encompass all the possible steps (hereby referred to as training sequences, see section S3.1 in the Supporting Information for details). Further tuning was performed in order to reproduce the elastic response to an external pulling force computed for the same set of sequences. Full details of the model and the coarse-graining procedure are found in the Methods and in Sections S1 and S2 in the Supporting Information.
MADna Reproduces Conformational Features from Atomistic Simulations. In order to benchmark the optimized simulation setup, we proceeded with a systematic comparison between coarse-grained predictions and results from atomistic simulations. We performed coarse-grained simulations of dsDNA molecules under the action of a pulling force f ranging in the interval 1−20 pN, following the same protocol as in refs 46 and 52 (see Methods for details on the implementation). Apart from the training sequences, we considered a second, independent set of molecules (testing sequences), which has also been studied in refs 46 and 52. The testing sequences are reported in Section S3.1 in the Supporting Information and include biologically relevant structures as well as synthetic A-tracts fragments. Both conformational and elastic properties were considered for the analysis.
The study of various conformational quantities is reported in Figure 5. For each observable, a scatter plot is depicted to compare coarse-grained and atomistic results at the smallest force considered, f = 1 pN. Training and testing sequences are indicated by red circles and green diamonds, respectively, while the black line indicates the bisector of the first and third quadrant. Overall, an excellent agreement is found, as indicated by the localization of the plotted points in the vicinity of the bisector and by the large values attained by the Pearson coefficient for all the considered quantities. In Figure 5a, we plot the crookedness β, 52 a dimensionless parameter accounting for the global deviation of the helical axis with Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article respect to a straight line, i.e., the presence of a spontaneous curvature. Larger values of β correspond to more curved structures, and it was found that A-tracts show the straightest conformations. 52 In Figure 5b, we analyze the helical diameter. Also in this case, the coarse-grained model faithfully captures the dependence on sequence within a range spanning about 0.2 nm, although a slight overestimation of the atomistic values can be appreciated. In Figure 5c−h we plot the comparison of various helical features, namely h-rise (Figure 5c), h-twist (Figure 5d), and groove geometry (Figure 5e−h). The coarse -grained predictions closely follow the sequence dependence of the atomistic values, although a systematic overestimation of about 0.2 nm is present in the case of the depth of the major groove (Figure 5g). Finally, in Figure 5i we compare the values for the local dihedral SBBS, i.e., the dihedral formed by the sugars and bases within each base pair (inset in Figure 5i). Again, MADna reproduces with high precision the large variability induced by sequence heterogeneity, showing its ability to capture features emerging from the interaction between neighboring base pairs. The close agreement between coarse grain and atomistic simulations indicates the high accuracy of MADna in describing the conformational properties of dsDNA. Several striking features are worth mentioning. First, as mentioned above, the testing sequences were not employed to build the model, so that in this case the coarse-grained results are pure predictions. Second, none of the quantities reported in Figure  5 were directly used to build the model; thus, a quantitative agreement is not trivial also in the case of the training sequences. Third, these observables are related to different scales, encompassing a single base pair (SBBS), a base step (htwist and h-rise), multiple-step geometry (grooves), and the molecule as a whole (β and diameter). Fourth, the selected quantities have a stark dependence on sequence, as shown by their wide range of variability, indicating that the model captures also this intrinsic heterogeneity. Particularly striking is the case of the dihedral SBBS: despite the only two possible choices for the bases (the Watson−Crick pairs AT and CG), this quantity displays a strong variability, ranging from −30 to 10 deg (Figure 5i). This is evidently an emergent behavior induced by the interaction of these base pairs with their neighbors and is perfectly reproduced by the coarse-grained model.
MADna Reproduces Elastic Constants Obtained from Atomistic Simulations. Next, we focused on the elastic properties of training and testing sequences ( Figure 6). The effective stretching modulus S̃ (Figure 6a) and the crookedness elastic constant k β (Figure 6b) are related to the elastic response of extension and crookedness to the external force, respectively, and are obtained as the slopes of the corresponding observables as a function of f (see Methods and Figure S3 in the Supporting Information). The torsional modulus C (Figure 6c) accounts for the change of the h-twist upon application of an external torque. For the present setup, C was computed by analyzing the fluctuations of the h-twist for f = 1 pN. Finally, the twist−stretch coupling constant g ( Figure  6d) quantifies the torsional response to the external force. The negative sign displayed by g (Figure 6d) implies that the molecule overwinds when stretched, in agreement with experimental observations. 31,32 Precise definitions of the elastic constants and their computation are reported in the Methods.
As discussed in the Methods and in Section S2 in the Supporting Information, the values of S, k β , C, and g for the training sequences were used to refine the parameters of the bonded interactions obtained by the Boltzmann inversion of the atomistic trajectories. Due to the large number of bonded parameters, we opted for a pragmatic approach and adjusted a minimal subset of parameters. Particularly, we empirically observed that a major impact on the elastic constants was obtained by tuning the rigidities of bond 5′-BB-3′, angle 5′-PSB-3′, and dihedral 5′-SPSP-3′. Interestingly, these three bonded interactions have a clear physical meaning (compare Figure 3). Indeed, the bond 5′-BB-3′ accounts for stacking interactions, which are likely to affect the stretching stiffness 52 as well as the coupling between twist and stretch deformations. Moreover, the range of values encompassed by the angle 5′-PSB-3′ is related to the flexibility of the sugar pucker, which has been shown to play a key role in dsDNA elasticity. 46 Finally, the 5′-SPSP-3′ dihedral is likely to affect the backbone response and was found to be the quantity most affecting the crookedness rigidity. In this context, it is also worth mentioning that the BB-WC elastic constants for the pairs AT and CG approximately follow the ratio 2:3 (see Table S2 in the Supporting Information). The BB-WC bonds account for the interactions between Watson−Crick base pairs; hence, this ratio nicely reflects the presence of two and three hydrogen bonds for AT and CG, respectively.
In Figure 6, we show the scatter plots comparing S, k β , C, and g for both the training and the testing sequences, while the corresponding statistical indicators are reported in Table 1.
MADna closely captures the average and sequence-induced standard deviation of all the elastic quantities (Table 1). While this is expected for the training sequences, which were used to refine the model, the excellent agreement found for the testing sequences (average within 6% and standard deviation within 30% of the atomistic values) indicates the good performance of our model. The capacity of MADna to describe the detailed sequence dependence can be assessed by looking at Figure 6. MADna precisely accounts for S̃and k β (Figure 6a and Figure  6b, respectively), as indicated by the large values obtained for the Pearson coefficient (calculated considering only the testing sequences). In contrast, despite reproducing the average and standard deviation of C, MADna does not capture its detailed sequence dependence, as indicated by the low value of the Pearson coefficient. While this can be due to an intrinsic limit of our model, we note that also the input data set might be at fault: in contrast with S̃and k β , which are obtained by considering average values of the corresponding observables, C is determined by computing the fluctuations of the cumulated twist. Fluctuations are knowingly slower to converge when  Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article compared to averages, suggesting that the lack of accuracy of MADna might be due to sampling limitations of the atomistic simulations. Since to our knowledge there are currently no experiments addressing the sequence dependence of the twist modulus, and given the excellent performance of MADna in capturing the average value of C (see Figure 7c below), we decided to leave a thorough assessment of the matter to future investigation. Finally, the twist−stretch coupling g displays a good linear correlation. The modest value of the Pearson coefficient is an expected side effect of the large indeterminacy of the atomistic results, for which the error bars have sizes comparable to the dispersion of the average values. It has to be noted that some outliers are present; in particular for sequence A8T the coarse-grained model predicts a twist−stretch coupling markedly different from the atomistic result g ≃ 0.
Nevertheless, as discussed in Section S5.1 in the Supporting Information, this disagreement is likely to originate from a possible lack of convergence of the atomistic simulations at large forces for this specific sequence. Based on this observation, we excluded this point for the quantitative evaluation of the agreement between the two data sets by means of the Pearson coefficient reported in Figure 6d and for the corresponding statistical indicators in Table 1.
Comparison with Experiments: Sequence-Averaged Persistence Length. The persistence length l p quantifies the bending rigidity of a polymer and is possibly the most characterized mechanical property of dsDNA. [16][17][18][19]21,23,29,37,49 In its classical formulation from polymer theory, l p corresponds to the length of the fragment to be considered in order to observe significant thermally induced bending effects. 90 More technically, l p is defined as the decay length of the thermally averaged tangent-vector correlation function c p , that is, c p = exp(−s/l p ), where s is the contour length of the polymer fragment under consideration. 90 This definition relies on the assumption that the minimum-energy conformation for the polymer is that of a straight rod. In the case of dsDNA, according to the sequence a spontaneous curvature may be present. In order to account separately for intrinsic bending and thermal fluctuations, it is customary to distinguish the static (l s ) and dynamic (l d ) contributions to the persistence length, which characterize the decay length of the suitably defined correlation functions c s = exp(−s/l s ) (computed from the minimum-energy structure) and c d = exp(−s/l d ). 56 The three lengths are approximately related by a harmonic sum, 1/ l p ≃ 1/l s +1/l d . 87 In order to compute the persistence length and its contributions, we performed simulations for 20 random sequences of length 100 base pairs. The minimum-energy structures (see Methods) and the equilibrated trajectories were then employed to compute the correlation functions c p , c s , and c d according to eqs 12, 13, and 14, respectively. The results are reported in Figure S5 in the Supporting Information for various definitions of the tangent vector. Here, we focus on the results of one of such definitions, according to which the tangent vectors are obtained by joining the geometrical centers of sugars of base pairs separated by 10 steps (see Methods). Fitting the correlation function c p with an exponential decay resulted in computed values of the persistence length l p ranging between 46 and 64 nm, with an average equal to l p = 56 ± 1 nm. This is in good agreement with experimental values on random sequences and standard ionic conditions (45−55 nm [16][17][18][19]21,23,29,37,49 ), particularly since l p was not employed in the parametrization of the model. Other coarse-grained models give predictions for l p within the experimental range, 60,62,65,68,71 although in most of these works the persistence length of double-60,68,71 or single-stranded 65 DNA was employed as a target quantity in the construction of the force field. In the present case, the slight overestimation of l p is in line with previous results of atomistic simulations, where an average l p = 57 ± 3 was found. 49 As expected, 56 for each sequence the static correlation function c s is found to decrease more slowly than c p ( Figure S5 in the Supporting Information) since it lacks the disordering action of thermal fluctuations. The corresponding static persistence length l s was computed by fitting c s via an exponential decay, in analogy to c p . The obtained values vary widely, ranging from 199 to 796 nm, with an average equal to l s = 485 ± 42 nm. This is somewhat smaller than previous estimations (l s = 576 ± 191 nm 49 ), although we note that this value varies wildly by considering different sequences and, even for the same sequences, by employing different definitions (see below). As discussed in ref 49, its large heterogeneity may rationalize the markedly different results reported in experiments, where values of l s as different as 130 nm and >1000 nm have been estimated. 16,91 Fitting c d via an exponential decay yields values for the dynamic persistence length l d ranging between 56 and 77 nm, with an average equal to l d = 64 ± 1 nm. This prediction is in line with a previous report based on atomistic simulations 49 (l d = 64.7 ± 1.4 nm) and lies between the results obtained from cryoelectron microscopy 16 (l d = 82 ± 15 nm) and cyclization experiments 91 (l d = 50 ± 1 nm). The difference between the two experimental values might be ascribed to the different techniques employed, for which future simulations with the present model may shed some light.
As a final consideration, it has to be noted that the estimation of the persistence length may vary according to the definition of the tangent vectors. 56  Comparison with Experiments: Sequence-Averaged Elastic Constants. A further set of simulations was devoted to determine the sequence-averaged elastic constants. These quantities were already estimated for the training and testing sequences ( Figure 6), but their quantitative determination is likely to depend on the details of the microscopic definitions, particularly for the h-twist. In order to enable a quantitative comparison with experimental values, we designed a simulation framework which avoids relying on such definitions and which is more akin to single-molecule experimental setups.
As shown schematically in Figure 7a, in this set of simulations a dsDNA molecule is subjected to a constant force f and torque τ acting along the z axis. This enables defining unequivocally the twist angle θ starting from the projection of the base pairs onto the xy plane, in analogy with single-molecule experiments based on rotor beads. 31 The linear response of the extension L as a function of f provides access to the effective stretch modulus S. Analogously, the torsion modulus C can be computed from the dependence of θ on τ. Finally, the twist−stretch coupling g can be obtained by Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article looking at the cross-dependence, i.e., the dependence of L on τ or the response of θ to changes in f. The quantitative details can be found in the Methods. We performed simulations for five randomly generated sequences of length equal to 40 base pairs at several values of f and τ (see Methods for full details and Section S3.3 in the Supporting Information for the sequences, which are named ST1, ..., ST5). With this choice of sequence length, the molecules are long enough so that several turns of the double helix are present but short enough to neglect the effect of bending. For comparison, we also run simulations for the same set of sequences and parameters by employing the two most widely used coarse-grained models from the literature, namely oxDNA2 80,81 and the sequence-dependent model 3SPN2C. 82 From a qualitative perspective, a marked difference between MADna and the other two models becomes evident when analyzing the change in twist Δθ as a function of the force (Figure 7b). Indeed, the present model prescribes that dsDNA overwinds when stretched (Δθ > 0), which is in agreement with experimental observations. 31,32 In contrast, neither oxDNA2 nor 3SPN2C capture this feature, predicting unwinding upon pulling (in the case of oxDNA this fact was already observed in the original publication 68 ). To our knowledge, no other coarse-grained model available in the literature has been shown to predict this unintuitive behavior of dsDNA.
MADna shows better agreement with experiments also from a quantitative standpoint. As shown in Figure 7c and Table 2, it predicts values for S, C, and g in good quantitative agreement with the experiments. As for the other models, oxDNA2 shows a similar performance for C, while it tends to overestimate S. Coherently with the results shown in Figure 7b, the wrong sign for g is found for both oxDNA2 and 3SPN2C, with the latter showing in general a significant overestimation of the elastic constants. The results reported in Figure 7c and Table 2 correspond to forces larger than 10 pN, which was chosen as a reasonable threshold to avoid bending effects. Nevertheless, performing the analysis with the full set of forces resulted in a small change in S̃(roughly 10%) and in virtually no change in C and g. It is also worth mentioning that the range considered for the torque (0−30 pN·nm) corresponds to supercoiling densities below the threshold value σ ≃ 0.05 usually associated with torque-induced denaturation, 22 hence making the results obtained for MADna relevant for real dsDNA molecules within the whole range of applied torques. Particularly, we computed the supercoiling density as σ = Δθ/θ 0 , since the lack of bending results in the absence of relevant writhe. For the largest torque applied (τ = 30 pN·nm), we found σ ≃ 0.046 for MADna, σ ≃ 0.043 for oxDNA2, and σ ≃ 0.014 for 3SPN2C, which is lower than for the other two models due to the larger value of the twist modulus C.
The values of g obtained for MADna in Table 1 and Table 2 differ by a factor of 2 with respect to each other. This striking disagreement appears to be too large to be simply ascribed to the different sequences considered or the different definitions employed for the twist angle. In this regard, based on atomistic simulations it has been suggested that the stretching modulus depends on the size of the fragment under consideration. 94 Since in the present case we are comparing sequences of size 20 and 40 base pairs, it might be that a similar effect is present for g. In order to check this hypothesis, we performed another set of simulations following the same protocol as in the present section but considering sequences made of 20 base pairs,  which are listed as ST1-short, ..., ST5-short in Section S3.3 in the Supporting Information. The resulting elastic constants are reported in Figure S6 in the Supporting Information. Particularly, we found that for the short sequences g = −258 ± 16 pN·nm, which is indeed in line with the results reported in Figure 6. This confirms the presence of size effects in MADna for the prediction of elastic constants. To our knowledge, this is the first instance in which a length dependence for g has been predicted. This feature might explain the systematic overestimation of g in atomistic simulations that are restricted to short sequences. 46 Another interesting effect observed in experiments is the coupling between twist and bending, which is relevant at pulling forces up to a few piconewtons. 33,34 As a consequence of this coupling, the twist of the DNA molecule appears to be softer; i.e., the effective twist modulus is C eff < C. In order to study this feature, we considered a set of simulations involving DNA molecules of 150 base pairs, which is comparable to the persistence length, thus enabling the presence of bending fluctuations. The simulation protocol was similar to the case just analyzed, although here no torque was applied. We considered three independent sequences which are reported in Section S3.4 in the Supporting Information. For each sequence, simulations were performed for MADna, oxDNA2, and 3SPN2C at pulling forces ranging between 0.05 and 2.5 pN. Further simulation details can be found in the Methods. In Figure 8 we report the results obtained and compare them with the experimental values from refs 33 and 34. In line with the trend observed in Figure 7, MADna and oxDNA2 follow quite closely the experimental curve, while 3SPN2C systematically overestimates the value of C eff .
Comparison with Experiments: Sequence-Dependent Conformation and Elasticity. Having characterized and benchmarked the sequence-averaged mechanical properties of MADna, we now turn to sequence-dependent features. In this regard, we considered the sequences studied experimentally in ref 28, where the authors characterized the sequencedependent persistence length and helical repeat by means of cyclization experiments.
We performed simulations for DNA molecules of 100 base pairs obtained by taking the central parts of 14 different experimental sequences as listed in Section S3.2 in the Supporting Information. For each simulated sequence, we computed the helical pitch by dividing the cumulated helical twist by 2π, while for the persistence length we employed the same approach as above. In Figure 9 we report the comparison between experiments and MADna predictions. As the plots show, there is a high correlation between the two data sets, thus indicating that MADna satisfactorily captures the sequence dependence of two key features of DNA conformation (helical pitch) and elasticity (persistence length). From a quantitative perspective, we see from Figure 9a that the simulated values for the pitch are larger than the experimental ones. Nonetheless, a quantitative comparison has to be performed with care. Indeed, the experimental values were obtained indirectly by analyzing cyclization data by means of a wormlike chain with twist. 28 Although the relative differences observed in the experimental results for the different sequences are robust, the values depend on the validity of this model down to the scale of single steps and on the value assigned to the twist modulus when fitting the data. Moreover, it is likely that the details of the definition of the h-twist in the simulations affect the quantitative determination of the helical pitch. To our knowledge, this is the first instance in which these structural data have been compared to predictions from simulations. As for the persistence length, the strong correlation found has to be mitigated by the slight quantitative  overestimation of l p in simulations, in line with the results reported above for the sequence-averaged persistence length. Moreover, the values of l p from simulations appear to be more heterogeneous than in experiments. While this can be ascribed to intrinsic limitations of MADna, it has to be noted that the smaller dispersion of values determined from experiments might originate from some assumptions made in their analysis. Indeed, experimental persistence lengths were obtained by fitting j-factors from cyclization experiments, where the theoretical formulas depend on l p , the helical pitch, the twist modulus C, and the contour length L. 28 The data reported in ref 28 were obtained by assuming sequence-independent C and L, which neglects two sources of sequence-driven heterogeneity, suggesting that the values obtained in experiments might underestimate the actual range of variability of l p .
For comparison, we also performed simulations for oxDNA2 and 3SPN2C, for which the results are reported in Figure S7 in the Supporting Information. We found that oxDNA2 predictions do not correlate with experiments for either the helical pitch or the persistence length. This is expected, since in this model the sequence dependence is implemented by tuning the base-pairing and stacking interactions to account for thermodynamic data but not for the elasticity of DNA. In the case of 3SPN2C, we found a weak correlation for the helical pitch but a strong correlation for the persistence length. This high correlation was also expected, since the persistence-length data from ref 28 were used to parametrize the model. 82 For completeness, we mention that also CGDNA has been used to reproduce these experimental persistence-length data, for which a similar correlation as for MADna was found (r = 0.73). 56 As a further test, we performed stretching simulations for phased A-tracts, i.e., dsDNA molecules obtained by alternating fragments of consecutive adenines and random sequences, with each fragment having a length of 5−10 base pairs. It was experimentally found that the stretching modulus for such molecules is roughly 50% larger than for random sequences. 21 We run some stretching simulations for five phased A-tracts of 40 base pairs, whose sequences were extracted as fragments of the experimental ones 21 and are reported as A-tract-1, ..., Atract-5 in Section S3.3 in the Supporting Information. Simulation protocols and analysis were the same as in the case of random sequences studied above. The results from the three models were S̃= (1402 ± 48) pN for MADna, S̃= (1444 ± 2) pN for oxDNA2, and S̃= (2793 ± 67) pN for 3SPN2C. By comparing these values with the results reported in Table 2, we thus conclude that MADna predicts an increase of roughly 35%, oxDNA2 does not predict any increase, and 3SPN2C predicts a 8% increase. As expected, the absence of elasticityoriented sequence dependence for oxDNA2 prevents it from capturing this feature. The prediction of MADna is the one most closely resembling the significant increase observed experimentally, although still underestimating its extent.
Strengths and Limits. The comparison between MADna, oxDNA2, 3SPN2C, and experiments can be summarized as follows. MADna performs in general better at capturing both sequence-averaged and sequence-dependent conformational and elastic features. As for the other two models, oxDNA2 reproduces well the sequence-averaged properties, but it cannot account for sequence dependence. In contrast, 3SPN2C captures well the sequence dependence of elasticity in the case of the persistence length, while its accuracy for the change in stretch modulus or conformation is more limited.
Moreover, it tends in general to overestimate the magnitude of the elastic constants.
The accuracy of MADna in addressing the sequence dependence of conformational and elastic properties of dsDNA appoints it as the ideal choice to interpret the outcome of single-molecule experiments, as well as to study the conformational changes induced by mechanic stress, which are relevant for many systems in vivo. Moreover, it can be employed as a virtual laboratory to test analytical theories on DNA elasticity. 95,96 Nonetheless, at the present stage MADna cannot account for breaking events such as the formation of kinks or local melting. Hence, it is important to assess the relevance of such mechanisms for the system under study before drawing conclusions based on simulations performed with the present model. We are currently working to surpass such limitations, in order to further extend the palette of possible systems which can be analyzed through the lens of MADna.
Further limitations are introduced by the use of an implicit solvent, which prevents studying DNA solvation. Similarly, the use of reduced charges interacting via the Debye−Huckel potential does not account for ion−ion correlations, which are particularly relevant for multivalent ions. 97 We note that MADna shares these limitations with all the coarse-grained models based on a similar philosophy, including, e.g., oxDNA2 and 3SPN2C. In view of future modeling of the interaction of DNA with proteins by means of MADna, the potential obstacle provided by these limitations can however be overcome by considering a distance-dependent dielectric constant for the electrostatic interaction between charged sites on DNA and proteins. 98 LAMMPS Implementation and Availability. The standard potentials employed in MADna enable a straightforward implementation of the model in all common Molecular Dynamics simulators. The molecular builder, the main scripts used for the analysis, and some sample scripts for simulation in LAMMPS are provided at https://github.com/saassenza/ MADna/tree/main/LAMMPS. The molecular builder takes as input the sequence and provides both the topology and the initial coordinates in LAMMPS format. In order to run MADna in LAMMPS (at the time of writing, the stable version is 29Sep2021), the latter has to be built with the packages MOLECULE, EXTRA-MOLECULE, and EXTRA-PAIR.

■ CONCLUSION
We have introduced MADna, a novel coarse-grained model for the simulation of double-stranded DNA. MADna captures the sequence dependence of conformational and elastic properties with accuracy comparable to that of atomistic simulations. Key conformational features which closely follow atomistic results include the main helical parameters, the groove geometry, the diameter of the double helix, and the spontaneous curvature as quantified by the crookedness. The model also predicts sequence-averaged and sequence-dependent elasticity and conformation in agreement with experimental results for a wide set of features, namely the stretching and torsion moduli, the counterintuitive negative twist−stretch coupling, the twist−bend coupling, the persistence length, and the helical pitch. At the present stage, the model does not account for breaking events, which are being addressed in ongoing work. The implementation of MADna in Molecular Dynamics software is straightforward due to the common potentials employed. Sample scripts for LAMMPS and a molecular builder are openly provided on GitHub.
Due to both its accuracy and its simplicity of use, we believe that MADna provides a significant addition to the toolbox of coarse-grained simulations and will enable precise theoretical studies of a wide set of large-scale DNA phenomena.
Full list of the parameters of the model, additional analysis, and additional methods (PDF)

■ AUTHOR INFORMATION
Corresponding Author